<!DOCTYPE html>
<html lang="en">
    <!--
      Licensed to the Apache Software Foundation (ASF) under one or more
      contributor license agreements.  See the NOTICE file distributed with
      this work for additional information regarding copyright ownership.
      The ASF licenses this file to You under the Apache License, Version 2.0
      (the "License"); you may not use this file except in compliance with
      the License.  You may obtain a copy of the License at
          http://www.apache.org/licenses/LICENSE-2.0
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.
    -->
    <head>
        <meta charset="utf-8" />
        <title>PartitionRecord</title>

        <link rel="stylesheet" href="../../../../../css/component-usage.css" type="text/css" />
    </head>

    <body>
    	<p>
    		PartitionRecord allows the user to separate out records in a FlowFile such that each outgoing FlowFile
    		consists only of records that are "alike." To define what it means for two records to be alike, the Processor
    		makes use of NiFi's <a href="../../../../../html/record-path-guide.html">RecordPath</a> DSL.
    	</p>
    	
    	<p>
    		In order to make the Processor valid, at least one user-defined property must be added to the Processor.
    		The value of the property must be a valid RecordPath. Expression Language is supported and will be evaluated before
    		attempting to compile the RecordPath. However, if Expression Language is used, the Processor is not able to validate
    		the RecordPath before-hand and may result in having FlowFiles fail processing if the RecordPath is not valid when being
    		used.
    	</p>
    	
    	<p>
    		Once one or more RecordPath's have been added, those RecordPath's are evaluated against each Record in an incoming FlowFile.
    		In order for Record A and Record B to be considered "like records," both of them must have the same value for all RecordPath's
    		that are configured. Only the values that are returned by the RecordPath are held in Java's heap. The records themselves are written
    		immediately to the FlowFile content. This means that for most cases, heap usage is not a concern. However, if the RecordPath points
    		to a large Record field that is different for each record in a FlowFile, then heap usage may be an important consideration. In such
    		cases, SplitRecord may be useful to split a large FlowFile into smaller FlowFiles before partitioning.
    	</p>
    	
    	<p>
    		Once a FlowFile has been written, we know that all of the Records within that FlowFile have the same value for the fields that are
    		described by the configured RecordPath's. As a result, this means that we can promote those values to FlowFile Attributes. We do so
    		by looking at the name of the property to which each RecordPath belongs. For example, if we have a property named <code>country</code>
    		with a value of <code>/geo/country/name</code>, then each outbound FlowFile will have an attribute named <code>country</code> with the
    		value of the <code>/geo/country/name</code> field. The addition of these attributes makes it very easy to perform tasks such as routing,
    		or referencing the value in another Processor that can be used for configuring where to send the data, etc.
    		However, for any RecordPath whose value is not a scalar value (i.e., the value is of type Array, Map, or Record), no attribute will be added.
    	</p>
    	
    	
    	
    	<h2>Examples</h2>
    	
    	<p>
    		To better understand how this Processor works, we will lay out a few examples. For the sake of these examples, let's assume that our input
    		data is JSON formatted and looks like this:
    	</p>

<code>
<pre>
[ {
  "name": "John Doe",
  "dob": "11/30/1976",
  "favorites": [ "spaghetti", "basketball", "blue" ],
  "locations": {
  	"home": {
  		"number": 123,
  		"street": "My Street",
  		"city": "New York",
  		"state": "NY",
  		"country": "US"
  	},
  	"work": {
  		"number": 321,
  		"street": "Your Street",
  		"city": "New York",
  		"state": "NY",
  		"country": "US"
  	}
  }
}, {
  "name": "Jane Doe",
  "dob": "10/04/1979",
  "favorites": [ "spaghetti", "football", "red" ],
  "locations": {
  	"home": {
  		"number": 123,
  		"street": "My Street",
  		"city": "New York",
  		"state": "NY",
  		"country": "US"
  	},
  	"work": {
  		"number": 456,
  		"street": "Our Street",
  		"city": "New York",
  		"state": "NY",
  		"country": "US"
  	}
  }
}, {
  "name": "Jacob Doe",
  "dob": "04/02/2012",
  "favorites": [ "chocolate", "running", "yellow" ],
  "locations": {
  	"home": {
  		"number": 123,
  		"street": "My Street",
  		"city": "New York",
  		"state": "NY",
  		"country": "US"
  	},
  	"work": null
  }
}, {
  "name": "Janet Doe",
  "dob": "02/14/2007",
  "favorites": [ "spaghetti", "reading", "white" ],
  "locations": {
  	"home": {
  		"number": 1111,
  		"street": "Far Away",
  		"city": "San Francisco",
  		"state": "CA",
  		"country": "US"
  	},
  	"work": null
  }
}]
</pre>
</code>


    	<h3>Example 1 - Partition By Simple Field</h3>
    	
    	<p>
    		For a simple case, let's partition all of the records based on the state that they live in.
    		We can add a property named <code>state</code> with a value of <code>/locations/home/state</code>.
    		The result will be that we will have two outbound FlowFiles. The first will contain an attribute with the name
    		<code>state</code> and a value of <code>NY</code>. This FlowFile will consist of 3 records: John Doe, Jane Doe, and Jacob Doe.
    		The second FlowFile will consist of a single record for Janet Doe and will contain an attribute named <code>state</code> that
    		has a value of <code>CA</code>.
    	</p>
    	
    	
    	<h3>Example 2 - Partition By Nullable Value</h3>
    	
    	<p>
    		In the above example, there are three different values for the work location. If we use a RecordPath of <code>/locations/work/state</code>
    		with a property name of <code>state</code>, then we will end up with two different FlowFiles. The first will contain records for John Doe and Jane Doe
    		because they have the same value for the given RecordPath. This FlowFile will have an attribute named <code>state</code> with a value of <code>NY</code>.
    	</p>
    	<p>
    		The second FlowFile will contain the two records for Jacob Doe and Janet Doe, because the RecordPath will evaluate
    		to <code>null</code> for both of them.  This FlowFile will have no <code>state</code> attribute (unless such an attribute existed on the incoming FlowFile,
    		in which case its value will be unaltered).
    	</p>
    	
    	
    	<h3>Example 3 - Partition By Multiple Values</h3>
    	
    	<p>
    		Now let's say that we want to partition records based on multiple different fields. We now add two properties to the PartitionRecord processor.
    		The first property is named <code>home</code> and has a value of <code>/locations/home</code>. The second property is named <code>favorite.food</code>
    		and has a value of <code>/favorites[0]</code> to reference the first element in the "favorites" array. 
    	</p>
    	
    	<p>
    		This will result in three different FlowFiles being created. The first FlowFile will contain records for John Doe and Jane Doe. If will contain an attribute
    		named "favorite.food" with a value of "spaghetti." However, because the second RecordPath pointed to a Record field, no "home" attribute will be added.
    		In this case, both of these records have the same value for both the first element of the "favorites" array
    		and the same value for the home address. Janet Doe has the same value for the first element in the "favorites" array but has a different home address. Similarly,
    		Jacob Doe has the same home address but a different value for the favorite food.
    	</p>
    	
    	<p>
    		The second FlowFile will consist of a single record: Jacob Doe. This FlowFile will have an attribute named "favorite.food" with a value of "chocolate."
    		The third FlowFile will consist of a single record: Janet Doe. This FlowFile will have an attribute named "favorite.food" with a value of "spaghetti."
    	</p>
    	
	</body>
</html>